Skip to content

HTML API: Ensure bookmark exaustion does not error#10616

Closed
sirreal wants to merge 25 commits intoWordPress:trunkfrom
sirreal:html-api/ensure-deep-nesting-no-exception
Closed

HTML API: Ensure bookmark exaustion does not error#10616
sirreal wants to merge 25 commits intoWordPress:trunkfrom
sirreal:html-api/ensure-deep-nesting-no-exception

Conversation

@sirreal
Copy link
Member

@sirreal sirreal commented Dec 10, 2025

Bookmark exhaustion, typically from deep nesting, can cause the HTML Processor to throw an Exception.

Rather than throwing an exception, return false when bookmark exhaustion is detected. An error is set on the processor and processing stops.

Trac ticket: https://core.trac.wordpress.org/ticket/64394


This Pull Request is for code review only. Please keep all other discussion in the Trac ticket. Do not merge this Pull Request. See GitHub Pull Requests for Code Review in the Core Handbook for more details.

Comment on lines +6307 to +6308
* @throws Exception When unable to allocate a bookmark for the next token in the input HTML document.
*
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

bookmark_token() actually throws, but it's not handled here. The annotation may not be appropriate.

* otherwise might involve messier calling and return conventions.
*/
return false;
} catch ( Exception $e ) {
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Exhausted bookmarks throw a generic Exception.

This block catches the exceptions thrown by insert_virtual_token().

@github-actions
Copy link

Test using WordPress Playground

The changes in this pull request can previewed and tested using a WordPress Playground instance.

WordPress Playground is an experimental project that creates a full WordPress instance entirely within the browser.

Some things to be aware of

  • The Plugin and Theme Directories cannot be accessed within Playground.
  • All changes will be lost when closing a tab with a Playground instance.
  • All changes will be lost when refreshing the page.
  • A fresh instance is created each time the link below is clicked.
  • Every time this pull request is updated, a new ZIP file containing all changes is created. If changes are not reflected in the Playground instance,
    it's possible that the most recent build failed, or has not completed. Check the list of workflow runs to be sure.

For more details about these limitations and more, check out the Limitations page in the WordPress Playground documentation.

Test this pull request with WordPress Playground.

Co-authored-by: Weston Ruter <westonruter@gmail.com>
@sirreal sirreal marked this pull request as ready for review January 29, 2026 18:27
@github-actions
Copy link

github-actions bot commented Jan 29, 2026

The following accounts have interacted with this PR and/or linked issues. I will continue to update these lists as activity occurs. You can also manually ask me to refresh this list by adding the props-bot label.

Core Committers: Use this line as a base for the props when committing in SVN:

Props jonsurrell, westonruter, dmsnell.

To understand the WordPress project's expectations around crediting contributors, please review the Contributor Attribution page in the Core Handbook.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR fixes an issue where bookmark exhaustion (typically from deep HTML nesting) causes the HTML Processor to throw an Exception. Instead of throwing exceptions, the processor now returns false when bookmark exhaustion is detected, sets an error state, and stops processing gracefully.

Changes:

  • Modified bookmark_token() to return false instead of throwing an exception when MAX_BOOKMARKS is exceeded
  • Updated all callers of bookmark_token() and insert_virtual_node() to check for and handle false return values
  • Removed outdated @throws documentation from methods that no longer throw exceptions
  • Added comprehensive tests to verify the processor handles extreme nesting without throwing exceptions

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
src/wp-includes/html-api/class-wp-html-processor.php Core implementation changes: modified bookmark_token() and insert_virtual_node() to return false on failure instead of throwing; added error checks in step(), step_before_html(), step_before_head(), step_after_head(), step_in_table(), and step_in_table_body() methods; updated PHPDoc to remove exception references
tests/phpunit/tests/html-api/wpHtmlProcessor.php Added two comprehensive tests verifying the processor handles bookmark exhaustion gracefully for both regular and virtual tokens; fixed annotation from @group to @ticket

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Member

@dmsnell dmsnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I definitely don’t love how pervasive and deep our error-checking has become here; did you attempt to use ->bail()? the existing code is throwing because that part of step() isn’t wrapped in a try/catch, right? what if we continue to throw inside bookmark_token() but trap that and handle aborting the same way we handle all deep control flow issues?

@sirreal
Copy link
Member Author

sirreal commented Jan 30, 2026

I definitely don’t love how pervasive and deep our error-checking has become here.

Me neither 😕

did you attempt to use ->bail()?

I did try using bail(). That transforms a bookmark exhaustion error into an unsupported error, which seemed confusing. When bookmarks are exhausted, an error is set. Bail overrides with an unsupported error code.

We could consider having bail leave the last error in case it's already been set.

what if we continue to throw inside bookmark_token() but trap that and handle aborting the same way we handle all deep control flow issues?

I have that in an earlier commit. It wasn't as pervasive (at b02d679):

try {
$bookmark_name = $this->bookmark_token();
} catch ( Exception $e ) {
if ( self::ERROR_EXCEEDED_MAX_BOOKMARKS === $this->last_error ) {
return false;
}
throw $e;
}

} catch ( WP_HTML_Unsupported_Exception $e ) {
/*
* Exceptions are used in this class to escape deep call stacks that
* otherwise might involve messier calling and return conventions.
*/
return false;
} catch ( Exception $e ) {
if ( self::ERROR_EXCEEDED_MAX_BOOKMARKS === $this->last_error ) {
return false;
}
// Rethrow any other exceptions for higher-level handling.
throw $e;
}

Co-authored-by: Weston Ruter <westonruter@gmail.com>
@dmsnell
Copy link
Member

dmsnell commented Jan 30, 2026

I have that in an earlier commit. It wasn't as pervasive

Why did you change it @sirreal? did I ask you to? 🙃 🤦‍♂️

@sirreal
Copy link
Member Author

sirreal commented Jan 30, 2026

No, I was just iterating and exploring. The method actually already had a …|false return type annotation, suggesting it may have intended to return false on failure.

I can't say I like the general Exception catching either, but it works. I don't have a strong preference, would you prefer that other version @dmsnell?

Copy link
Member

@westonruter westonruter left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, although I defer to the expertise of @dmsnell.

@dmsnell
Copy link
Member

dmsnell commented Jan 31, 2026

@sirreal is there a copy of the exception-catching version available?

what do you not like about it, or prefer about adding |false to all of the methods?

@sirreal
Copy link
Member Author

sirreal commented Feb 2, 2026

is there a copy of the exception-catching version available?

@dmsnell This diff should give an idea

what do you not like about it, or prefer about adding |false to all of the methods?

I don't feel strongly one way or the other.

Relying on error handling does reduce the checks required. My biggest concern is that it depends on methods being called inside of try/catch structures, which is easy to overlook. Handling a false return also requires some discipline, but at least the check is local and doesn't require understanding a call stack.

Some background to consider (all considering trunk):

  • bookmark_token() claims to have both behaviors (throw + return false) suggesting confusion or oversight. It's one or the other, currently throw.
    /**
    * Creates a new bookmark for the currently-matched token and returns the generated name.
    *
    * @since 6.4.0
    * @since 6.5.0 Renamed from bookmark_tag() to bookmark_token().
    *
    * @throws Exception When unable to allocate requested bookmark.
    *
    * @return string|false Name of created bookmark, or false if unable to create.
    */
    private function bookmark_token() {
  • insert_html_element() and insert_foreign_element() never fail because the token already exists with an allocated bookmark.
    private function insert_html_element( WP_HTML_Token $token ): void {
    $this->state->stack_of_open_elements->push( $token );
    }
  • insert_virtual_node() allocates a bookmark in order to construct a token. Virtual tokens are not "discovered" in the document, but arise from HTML parsing rules.
    private function insert_virtual_node( $token_name, $bookmark_name = null ): WP_HTML_Token {
    $here = $this->bookmarks[ $this->state->current_token->bookmark_name ];
    $name = $bookmark_name ?? $this->bookmark_token();
    $this->bookmarks[ $name ] = new WP_HTML_Span( $here->start, 0 );
    $token = new WP_HTML_Token( $name, $token_name, false );
    $this->insert_html_element( $token );
    return $token;

The main thing I disliked was the general Exception catching that was required. It's possible for the last_error to be set and another Exception to arise:

try {
$bookmark_name = $this->bookmark_token();
} catch ( Exception $e ) {
if ( self::ERROR_EXCEEDED_MAX_BOOKMARKS === $this->last_error ) {
return false;
}
throw $e;
}

} catch ( WP_HTML_Unsupported_Exception $e ) {
/*
* Exceptions are used in this class to escape deep call stacks that
* otherwise might involve messier calling and return conventions.
*/
return false;
} catch ( Exception $e ) {
if ( self::ERROR_EXCEEDED_MAX_BOOKMARKS === $this->last_error ) {
return false;
}
// Rethrow any other exceptions for higher-level handling.
throw $e;
}

Perhaps that check should have compared the known message:

throw new Exception( 'could not allocate bookmark' );

I also considered introducing another HTML API Exception class, but I'm reluctant to introduce another exception class for this.

@dmsnell
Copy link
Member

dmsnell commented Feb 20, 2026

The exception-throwing and catching version definitely suits me more, notably because it avoids exporting these details to the callers.

I also think this is a middle-path to excluding internal bookmarks, so spreading out the protection throughout the class seems like it’s going to be code we then have to find and remove later (though maybe the recent PHPStan changes will help surface those cases).

which is easy to overlook

If our goal is to never fail under normal usage and the failed bookmarks are only there to prevent infinite loops or memory-abuse, then perhaps we add a test to ensure that setting the bookmark fails and also that it doesn’t throw.

@sirreal
Copy link
Member Author

sirreal commented Feb 26, 2026

The exception-throwing and catching version definitely suits me more, notably because it avoids exporting these details to the callers.

I've restored the throw/catch version of these change.

perhaps we add a test to ensure that setting the bookmark fails and also that it doesn’t throw.

The PR includes tests like that. They test the processing behavior without testing the bookmark functions directly.

Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 2 out of 2 changed files in this pull request and generated no new comments.


💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Copy link
Member

@dmsnell dmsnell left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As a stop-gap I’m approving this, but there are a number of things I am still wrestling with.

  1. We can and I believe we should update the MAX_BOOKMARKS to 10,000 or possibly 6,000 (if we think we should be overly conservative, which I don’t). My approval stands if we want to update the count in this PR. When we limited the bookmarks in the HTML Processor we didn’t know what a good subjective value would be, but I believe that we’ve learned since then that we are safe to open it up to a point where we can parse all reasonable HTML documents on the internet, which is somewhere around 3,000 or 6,000 depending on whether we want to be able to parse clearly spam content or not.
  2. This is highlighting other work you’ve done to separate the namespace of internal and external bookmarks, or of eliminating them.
  3. The tests are hard to reason about, and they are making active assertions that the processor fails in tasks we want it to succeed in. I know we have to setup the test data in ways that should hit the failure point, but I don’t like knowing the fact that an update to fix the code will cause the tests to fail. This is not the same category for me as “we want to ensure nobody fixes this one piece without also fixing these other pieces” but more like we are writing a test that will bite us later.

Because I don’t want the tests to stand in the way I am giving the approval, but I would be happy as well if we either updated the tests to focus on the behaviors we want to enforce (that user-space bookmarking hits a limit), or even if we just removed the tests for now. I am nervous is all.

But yeah, please feel encouraged to bump the MAX_BOOKMARKS in this before merge unless you know a reason not to.

WP_HTML_Processor::ERROR_EXCEEDED_MAX_BOOKMARKS,
$processor->get_last_error(),
'Failed to report exceeded-max-bookmarks error.'
);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this test confuses me.

Not sure if you noticed, but we should have $this->assertEqualsWithDelta( MAX_BOOKMARKS, $reached_tokens, 1 ); available to say "within 1 of the max"

but why the variability? and what does “depending on how the tokens align” mean? it feels flakey or not-fully-understood to me, and I get nervous adding an assertion of that in case we’re testing a happenstance detail and not a contract.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This test is very tricky, and imperfect as it is. I want to confirm that these don't throw errors during processing, which is the big thing I want to address in this PR.

It would be fine to change this to assert that no assertions are thrown.

This test in particular was difficult because the virtual tokens had another failure path. Consider:

<table><td><table><td> produces:

└─TABLE (real)
  └─TBODY (virtual)
    └─TR (virtual)
      └─TD (real)
        └─TABLE (real)
          └─TBODY (virtual)
            └─TR (virtual)
              └─TD (real)

To make this test more robust, it would probably need to process 3 times:

  • <table><td>…
  • <div><table><td>…
  • <div><div><table><td>…

Since there are pairs of real tokens TD (real) + TABLE (real), running with 0, 1, 2 offsets should ensure that the virtual token lands on the threshold at least once.

@sirreal
Copy link
Member Author

sirreal commented Feb 26, 2026

@dmsnell Thanks, I've reworked some of the testing based on your feedback.

I do plan to land #10820 to increase the max bookmarks as well.

pento pushed a commit that referenced this pull request Feb 27, 2026
When bookmark exhaustion occurs during processing, return `false` instead of throwing an `Exception`.

Developed in #10616.

Props jonsurrell, westonruter, dmsnell.
Fixes #64394.


git-svn-id: https://develop.svn.wordpress.org/trunk@61755 602fd350-edb4-49c9-b593-d223f7449a82
@github-actions
Copy link

A commit was made that fixes the Trac ticket referenced in the description of this pull request.

SVN changeset: 61755
GitHub commit: 7a58dc0

This PR will be closed, but please confirm the accuracy of this and reopen if there is more work to be done.

@github-actions github-actions bot closed this Feb 27, 2026
markjaquith pushed a commit to markjaquith/WordPress that referenced this pull request Feb 27, 2026
When bookmark exhaustion occurs during processing, return `false` instead of throwing an `Exception`.

Developed in WordPress/wordpress-develop#10616.

Props jonsurrell, westonruter, dmsnell.
Fixes #64394.

Built from https://develop.svn.wordpress.org/trunk@61755


git-svn-id: http://core.svn.wordpress.org/trunk@61061 1a063a9b-81f0-0310-95a4-ce76da25c4cd
@sirreal sirreal deleted the html-api/ensure-deep-nesting-no-exception branch February 27, 2026 15:41
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants